AITopics | function approximation converge

Reinforcement Learning with Function Approximation Converges to a Region

Neural Information Processing SystemsApr-6-2023, 16:54:04 GMT

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program.

algorithm, function approximation converge, reinforcement learning

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

Add feedback

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Neural Information Processing SystemsDec-31-2001

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation of either the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.

algorithm, sarsa, trajectory, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Neural Information Processing SystemsDec-31-2001

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation of either the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.

algorithm, sarsa, trajectory, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Neural Information Processing SystemsDec-31-2001

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation ofeither the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.For example, fitted value iteration can diverge even for Markov processes [2]; Q-Iearning with linear function approximators can diverge, even when the states are updated according to a fixed update policy [3]; and SARSA(O) can oscillate between multiple policies with different value functions [4].

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology: